This is feedback on writeups from previous years.



Firstly 'sampling error' is one of those unfortunate terms which has a technical meaning slightly different from the everyday meaning of the words. It does NOT mean the error due to some bias in your collection (e.g. someone who tended to pick up brown snails more than pink, only collected in the woodland). That would generally be called a sampling BIAS (nb your experimental design might get over this by having the same people collect at each site - so the proportions may still be biased, but any difference between populations would still tell you that the underlying frequencies had changed).

Sampling error in its technical sense is the difference between your frequency estimate and the true value (http://en.wikipedia.org/wiki/Sampling_error). If you see a significant difference, then the WHOLE POINT OF THE TEST IS that the differences in frequency you observed are unlikely to be due to sampling error. IF the test is significant, the chance of a chi-squared value that big (or greater) is less than the P-value. Hence you should not have attributed a significant test to a small sample size (or at least said its unlikely).

Be careful; the P-value IS NOT the chance of the null hypothesis being true, nor the alternative. Nb we don't say we accept the null hypothesis if we have a small P-value, we say that we have failed to reject it. This may seem subtle, but it is an important point. We say this because there are two reasons why we may get this outcome, 1) the null may be true 2) we may have insufficient evidence.

Next, sampling error is different from 'genetic drift'. Genetic drift can CAUSE SIGNIFICANT DIFFERENCES IN ALLELE FREQUENCY. It will be going on in all populations. THE WHOLE POINT of the exercise was to try and show if the differences in allele frequency from one population to another were due to genetic drift or some other process. Lets take an experimental design where we had 8 samples in two different valleys, with scrub and grass.


Valley 1
S S
G G

Valley 2
S S
G G


Now we might get a result like this, where P indicates more pink & brown and Y indicates more yellow.

Valley 1
P P
Y Y

Valley 2
Y Y
P P

That seems better explained by drift and gene flow as there is no consistent trend with habitat, but the neighbouring populations are similar.
However, it could be a result of selection, if we assume that our division of the habitat into grass and shrub was not the relevant one for the snails.
With less gene flow we might get other patterns. E.g.

Valley 1
P Y
Y P

Valley 2
P P
P Y

This pattern also seems better explained by drift than by selection.


With more gene flow the difference between adjacent populations may be REDUCED irrespective of habitat. E.g.

Valley 1
P P
P P

Valley 2
Y Y
Y Y

Gene flow can also produce difference between populations in the same habitat if some populations are adjacent to a different habitat
E.g. with habitat

S G G
S G G

You might get phenotypes
P P Y
P P Y

Make sure you understand both these effects of gene flow

It is VERY IMPORTANT to be able to see how we can draw these different conclusions.
It is NOT enough so simply say that more samples increase the validity of the results... that is a rather basic GCSE-level way of talking about sampling error and sample size. For our purposes you need to be able to show how taking samples from different habitats in different spatial distributions enables us to detect the different effects of gene flow, drift and selection.

Other points

Some of you did not ensure that your expected values were always greater than 3, by combining phenotypic categories. Which categories to combine is your call- you need to use you biological intuition to decide which categories are 'most similar'.

Do NOT round your expected values to whole numbers.
Polymorphism usually considered to be property of populations (not a species). E.g. a species can contain some populations that are polymorphic and other that are not.
You may find some definitions of polymorphism that talk about it being a property of the phenotype, but in modern usage the term is more often used to mean the existence of multiple alleles at a locus (even if they have no effect on the phenotype).


Earlier years


The best experimental designs included REPLICATION - i.e. multiple collections from the same habitat type at the same altitude. If there was no significant variation between such replicates, that would make it much more convincing that any differences you found between different habitats was due the action of selection (especially if the spacing was the same in both cases).

If on the other hand there were differences between these replicates from the same habitat, that would imply some form of selection due to an environmental difference you did not identify, or that drift was having a substantial effect, or gene flow from nearby habitats with a different selective regime.

In the cases where such replicates showed no significant difference, you could bulk the samples to get larger sample sizes and greater statistical power.

Next, almost all of you (some honourable exceptions) just compared the 3 main background colours. When I did this in the workshops that was an ILLUSTRATION. There are many different ways of dividing up your samples: banding, dark vs light forms (putting 5 banded in with the dark backgrounds) etc. You might have tried out some of those.

Now on to the specific comments.

1) Do not EVER use 'dominant' to mean 'frequent' - since 'dominant' has a specific meaning in genetics.
2) Genera have capitals hence we write Cepaea nemoralis (NOT cepaea nemoralis)
3) You do not know that there has been little human interference at this site. In fact there has been intensive management, including cutting down of scrub from over most of the area
4) Comparing observed with expected within woodland does NOT evaluate whether dark snails are frequent in the woodland. It is a test of whether the different woodland samples have different frequencies of the various phenotypes
5) The point about snails moving slowly is that the genetic differences will build up over short distances, so we can observe them in one locality, rather than having to travel all over the country.
6) We would expect genes that confer a FITNESS advantage (in some genotypes) to spread, 'desirable' is ambiguous.
7) You should have presented your Chi square calculations
8) Gene flow would be expected to produce similar allele frequencies in the populations between which the flow has been occurring.
9) The ratio of live: dead snails is difficult to interpret - all snails die eventually, and leave their shells behind. If you find many dead shells, that might simply mean that the living population is particularly dense in that area (or has been in the past). One interesting comparison is between the allele frequencies in live and dead snails... a difference might indicate drift, or a change in habitat (as you are comparing past populations with the present)
10) Polymorphism is the presence of multiple alleles in one population. Different phenotypes might be due to a direct effect of the environment, not a genetic difference.
11) We cannot be sure that shell phenotype affects survival - that is a hypothesis.
12) Gene flow is the movement of genes from one population to another. It does not refer to the transmission of genes from one generation to the next in the same population.
13) Genetic drift is not deadly, though it can lead to the increase in frequency of deleterious alleles.
14) You should have counted all the bands, not just the larger ones.
15) The best experimental designs included REPLICATION in particular collections from the same habitat type at the same altitude. If there was no significant variation between such replicates, that would make it much more convincing that any differences you found between different habitats was due the action of selection. If on the other hand there were differences between these replicates, that would imply some form of selection due to an environmental difference you did not identify, or that drift was having a substantial effect, or gene flow from nearby habitats with a different selective regime.
16) Polymorphism means 'many morphs' hence the singular is 'morph' not 'polymorph'.
17) The morphs present will reflect the action of selection over many seasons over many preceding generations - because the snails alive then were the ancestors who transmitted their alleles to the current generation
18) A chi squared test evaluates whether the difference in frequencies is statistically significant. If it is, that suggests there is actually a difference in frequency between the populations - although it still remains possible (if unlikely) that the frequencies in the populations are in fact the same, but your sample just happened to be very different.
If the tests reports that it is not significant, the difference between samples might be due to sampling error (nevertheless there might be a real difference between the populations, just one not detected by the present samples)
19) Evidence of selection indicates that the different phenotypes propagate their genes with different success. That is NOT evidence regarding the survival of the SPECIES (Cepaea nemoralis). For example, selection can go on in a growing population because some phenotypes leave more offspring than others, even though all phenotypes are producing many offspring each.
20) A design with collections of one sample from each of 3 different habitats at the top of the hill and 3 different habitats at the bottom of the hill does not allow evaluation of the effect of genetic drift (as there are no replicates of the same habitat).
21) The existence of polymorphism does NECESSARILY not mean that no allele has a selective advantage - for example, one allele (say the one for yellow shells) might be in the process of slowly spreading thought the population.
22) Be careful with the term 'disruptive selection'. It is often used to describe selection for extreme phenotypes of a continuous trait (e.g. for tall and short individuals and against intermediate sizes). It is also sometimes used to describe selection for different genotypes in different locations. In which case it can lead to polymorphism but ONLY if there is some gene flow that mixes the genes originating from the different locations.
23) The singular of species is species (NOT specie)
24) Cepaea hortensis was not present on the site, so its interaction with Cepaea nemoralis did not require extensive mention in the introduction.
25) Cepaea do not have 'visual selection'. Some predators prey on Cepaea using their vision to detect them (e.g. thrushes).
26) If thrushes wipe out a Cepaea population, there will be no selection on the allele frequencies, as they contribute nothing to the next generation.
27) Climatic fluctuations only reduce the effective population size if they lead to crashes in the population size.
28) You should state your null hypotheses
30) If there is no significant difference in phenotype frequencies of the replicates from the same habitat, that suggests the effects of genetic drift is mild. Such a test should have been done and presented in your results.
31) Extensive gene flow would have caused allele frequencies to be alike in the populations which were connected by this gene flow.
32) Either selection, or genetic drift, or both can cause differences in allele frequencies between any pair of populations (whether they are in similar habitat or not) as long as there is not extensive gene flow between them.
33) If you provide a map, show the location of your sites!
34) You should evaluate the Chi square value for the whole table, not treat the different categories (colours) separately
35) It is unclear how Cepaea polymorphism relates to viruses
36) It is unclear what is meant by 'reliable' samples, large, randomly selected from among the existing individuals?
37) If an allele becomes fixed by drift, polymorphism will have been eliminated
38) Shrubs near woodland do not indicate gene flow in snails
39) We do not know for sure that it is colder in the valley bottom. It might be a sun trap, for example.
40) We do not know for sure that there is no gene flow from the top of the hill to the bottom (especially over several generations). You might argue (giving reasons) that it is unlikely.
41) We cannot assume that we KNOW which colours are favoured by selection in which habitats.. we might propose, on the basis of previous results, or some argument about thermal biology, for example
42) Snails have a low rate of movement, but their distribution is very large (across most of Europe)
43) You cannot detect drift WITHIN a population by using a chi square test, when you sample on one day. You can only test for differences between populations - which might be due to drift or other causes such as selection. Actually you might have had a go at getting a sample from two different time points in the same population by comparing live and dead shells (the dead ones being mostly from previous generations).
44) Selection is implicated if the same trend in allele frequencies is seen in different populations with the same habitat.
45) If two or more populations have the same allele frequency, that could be because they were connected by substantial gene flow which had evened out the allele frequency. However, it could also be because the same selective regime operated at each site, or that there had been insufficient time since the area was colonised for the frequencies to have diverged.
46) Late hand-in
47) With the chi squared test that we were doing, the Ho is that the frequencies are the same in different populations.
48) Expected values should be calculated to two decimal places (more preferably).
49) With only one large population at the top of the hill and one at the bottom, it is near impossible to try and distinguish the effects of drift, gene flow and selection
50) The environment does not change the genotype. It may change the selective regime, which over time may lead to differential success of different genotypes.
51) If two nearby sites have different allele frequencies due to genetic drift, that would suggest that gene flow has been insufficient over that distance to homogenise allele frequencies. That lesson should be applied to other pairs of sites similarly separated.
52) If the chi squared value is smaller than the critical value, we have no evidence for a difference in allele frequency - but one may still be there (albeit probably a small one)
53) Do not use isn't in scientific writing (use 'is not')
54) Read the book "Eats shoots and leaves" to understand the difference between a colon and a semi colon (and much else).
55) The idea of doing controlled experiments changing the snails' allele frequencies is a good idea.
56) The chi square tests you did tested for differences in proportions of each phenotype (not the absolute number)
57) We do not really have evidence to explain the overall frequency of each colour, only the differences in proportion between populations.
58) If expected values are less than 3 in any cell, adjacent categories of colour (etc) should have been merged to rectify this situation.
59) Genetic drift can cause the increase or even fixation of a disadvantageous allele, not just advantageous ones
60) Expected values should add up to the same total as the observed, otherwise you have made a serious error
61) If you compared banding, you could have compared colours too (and visa versa)
62) We don't know how extensive gene flow actually is
63) Even when humans can discern no difference between the environment at two sites (e.g. two woodland areas) there may be important differences which impose different selection regimes on the polymorphism
64) We do not normally exclude categories from a chi square if expected <3, we bulk the small category with the one we consider the most similar.
65) A chi squared test cannot tell you if the difference between two populations is due to drift or selection, only that there is a significant difference in proportions of the phenotypes.
66) Polymorphism is not a mechanism. It is an outcome of various PROCESSES (mutation, drift, gene flow and selection)